Overview

Dataset statistics

Number of variables27
Number of observations396030
Missing cells81589
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory81.6 MiB
Average record size in memory216.0 B

Variable types

Numeric12
Categorical15

Alerts

emp_title has a high cardinality: 173105 distinct values High cardinality
issue_d has a high cardinality: 115 distinct values High cardinality
title has a high cardinality: 48817 distinct values High cardinality
earliest_cr_line has a high cardinality: 684 distinct values High cardinality
address has a high cardinality: 393700 distinct values High cardinality
loan_amnt is highly correlated with term and 1 other fieldsHigh correlation
installment is highly correlated with loan_amntHigh correlation
open_acc is highly correlated with total_accHigh correlation
pub_rec is highly correlated with pub_rec_bankruptciesHigh correlation
total_acc is highly correlated with open_accHigh correlation
pub_rec_bankruptcies is highly correlated with pub_recHigh correlation
sub_grade is highly correlated with term and 2 other fieldsHigh correlation
grade is highly correlated with int_rate and 1 other fieldsHigh correlation
term is highly correlated with loan_amnt and 2 other fieldsHigh correlation
int_rate is highly correlated with term and 2 other fieldsHigh correlation
emp_title has 22927 (5.8%) missing values Missing
emp_length has 18301 (4.6%) missing values Missing
mort_acc has 37795 (9.5%) missing values Missing
annual_inc is highly skewed (γ1 = 41.04272475) Skewed
dti is highly skewed (γ1 = 431.0512254) Skewed
address is uniformly distributed Uniform
pub_rec has 338272 (85.4%) zeros Zeros
mort_acc has 139777 (35.3%) zeros Zeros
pub_rec_bankruptcies has 350380 (88.5%) zeros Zeros

Reproduction

Analysis started2022-11-29 17:37:52.294375
Analysis finished2022-11-29 17:39:18.417791
Duration1 minute and 26.12 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

loan_amnt
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1397
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14113.88809
Minimum500
Maximum40000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:18.541655image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum500
5-th percentile3250
Q18000
median12000
Q320000
95-th percentile30975
Maximum40000
Range39500
Interquartile range (IQR)12000

Descriptive statistics

Standard deviation8357.441341
Coefficient of variation (CV)0.5921430926
Kurtosis-0.06259753499
Mean14113.88809
Median Absolute Deviation (MAD)5500
Skewness0.7772854671
Sum5589523100
Variance69846825.77
MonotonicityNot monotonic
2022-11-29T23:09:18.648007image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1000027668
 
7.0%
1200021366
 
5.4%
1500019903
 
5.0%
2000018969
 
4.8%
3500014576
 
3.7%
800013539
 
3.4%
600012734
 
3.2%
500012443
 
3.1%
1600010129
 
2.6%
180009195
 
2.3%
Other values (1387)235508
59.5%
ValueCountFrequency (%)
5004
 
< 0.1%
7001
 
< 0.1%
7251
 
< 0.1%
7501
 
< 0.1%
8001
 
< 0.1%
9001
 
< 0.1%
9501
 
< 0.1%
10001448
0.4%
10254
 
< 0.1%
105010
 
< 0.1%
ValueCountFrequency (%)
40000180
< 0.1%
397001
 
< 0.1%
396001
 
< 0.1%
395001
 
< 0.1%
394751
 
< 0.1%
392001
 
< 0.1%
388251
 
< 0.1%
387501
 
< 0.1%
384751
 
< 0.1%
383001
 
< 0.1%

term
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
36 months
302005 
60 months
94025 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3960300
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row 36 months
2nd row 36 months
3rd row 36 months
4th row 36 months
5th row 60 months

Common Values

ValueCountFrequency (%)
36 months302005
76.3%
60 months94025
 
23.7%

Length

2022-11-29T23:09:18.757596image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-29T23:09:18.852820image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
months396030
50.0%
36302005
38.1%
6094025
 
11.9%

Most occurring characters

ValueCountFrequency (%)
792060
20.0%
6396030
10.0%
m396030
10.0%
o396030
10.0%
n396030
10.0%
t396030
10.0%
h396030
10.0%
s396030
10.0%
3302005
 
7.6%
094025
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2376180
60.0%
Space Separator792060
 
20.0%
Decimal Number792060
 
20.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m396030
16.7%
o396030
16.7%
n396030
16.7%
t396030
16.7%
h396030
16.7%
s396030
16.7%
Decimal Number
ValueCountFrequency (%)
6396030
50.0%
3302005
38.1%
094025
 
11.9%
Space Separator
ValueCountFrequency (%)
792060
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2376180
60.0%
Common1584120
40.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
m396030
16.7%
o396030
16.7%
n396030
16.7%
t396030
16.7%
h396030
16.7%
s396030
16.7%
Common
ValueCountFrequency (%)
792060
50.0%
6396030
25.0%
3302005
 
19.1%
094025
 
5.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII3960300
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
792060
20.0%
6396030
10.0%
m396030
10.0%
o396030
10.0%
n396030
10.0%
t396030
10.0%
h396030
10.0%
s396030
10.0%
3302005
 
7.6%
094025
 
2.4%

int_rate
Real number (ℝ≥0)

HIGH CORRELATION

Distinct566
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.63940005
Minimum5.32
Maximum30.99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:18.937565image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum5.32
5-th percentile6.89
Q110.49
median13.33
Q316.49
95-th percentile21.97
Maximum30.99
Range25.67
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.472157382
Coefficient of variation (CV)0.3278851978
Kurtosis-0.1439465381
Mean13.63940005
Median Absolute Deviation (MAD)3.08
Skewness0.420669472
Sum5401611.6
Variance20.00019165
MonotonicityNot monotonic
2022-11-29T23:09:19.037705image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.9912411
 
3.1%
12.999632
 
2.4%
15.619350
 
2.4%
11.998582
 
2.2%
8.98019
 
2.0%
12.127358
 
1.9%
7.97332
 
1.9%
16.296632
 
1.7%
13.116580
 
1.7%
6.036291
 
1.6%
Other values (556)313843
79.2%
ValueCountFrequency (%)
5.322440
 
0.6%
5.42465
 
0.1%
5.79333
 
0.1%
5.93431
 
0.1%
5.99278
 
0.1%
670
 
< 0.1%
6.036291
1.6%
6.17220
 
0.1%
6.241184
 
0.3%
6.39656
 
0.2%
ValueCountFrequency (%)
30.9913
< 0.1%
30.943
 
< 0.1%
30.893
 
< 0.1%
30.841
 
< 0.1%
30.799
< 0.1%
30.744
 
< 0.1%
30.495
 
< 0.1%
29.997
< 0.1%
29.968
< 0.1%
29.6715
< 0.1%

installment
Real number (ℝ≥0)

HIGH CORRELATION

Distinct55706
Distinct (%)14.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean431.849698
Minimum16.08
Maximum1533.81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:19.147647image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum16.08
5-th percentile109.51
Q1250.33
median375.43
Q3567.3
95-th percentile925.6
Maximum1533.81
Range1517.73
Interquartile range (IQR)316.97

Descriptive statistics

Standard deviation250.7277895
Coefficient of variation (CV)0.5805904014
Kurtosis0.7838199213
Mean431.849698
Median Absolute Deviation (MAD)150.5
Skewness0.9835981609
Sum171025435.9
Variance62864.42443
MonotonicityNot monotonic
2022-11-29T23:09:19.257481image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
327.34968
 
0.2%
332.1791
 
0.2%
491.01736
 
0.2%
336.9686
 
0.2%
392.81683
 
0.2%
332.72641
 
0.2%
337.47624
 
0.2%
317.54574
 
0.1%
654.68556
 
0.1%
261.88527
 
0.1%
Other values (55696)389244
98.3%
ValueCountFrequency (%)
16.081
< 0.1%
16.251
< 0.1%
16.311
< 0.1%
16.471
< 0.1%
19.871
< 0.1%
20.221
< 0.1%
21.251
< 0.1%
21.621
< 0.1%
21.991
< 0.1%
22.241
< 0.1%
ValueCountFrequency (%)
1533.811
< 0.1%
15271
< 0.1%
1503.851
< 0.1%
1479.491
< 0.1%
1464.421
< 0.1%
1458.251
< 0.1%
1451.142
< 0.1%
1451.122
< 0.1%
1445.91
< 0.1%
1443.761
< 0.1%

grade
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
B
116018 
C
105987 
A
64187 
D
63524 
E
31488 
Other values (2)
14826 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters396030
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowB
3rd rowB
4th rowA
5th rowC

Common Values

ValueCountFrequency (%)
B116018
29.3%
C105987
26.8%
A64187
16.2%
D63524
16.0%
E31488
 
8.0%
F11772
 
3.0%
G3054
 
0.8%

Length

2022-11-29T23:09:19.367678image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-29T23:09:19.477897image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
b116018
29.3%
c105987
26.8%
a64187
16.2%
d63524
16.0%
e31488
 
8.0%
f11772
 
3.0%
g3054
 
0.8%

Most occurring characters

ValueCountFrequency (%)
B116018
29.3%
C105987
26.8%
A64187
16.2%
D63524
16.0%
E31488
 
8.0%
F11772
 
3.0%
G3054
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter396030
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B116018
29.3%
C105987
26.8%
A64187
16.2%
D63524
16.0%
E31488
 
8.0%
F11772
 
3.0%
G3054
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Latin396030
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
B116018
29.3%
C105987
26.8%
A64187
16.2%
D63524
16.0%
E31488
 
8.0%
F11772
 
3.0%
G3054
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII396030
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B116018
29.3%
C105987
26.8%
A64187
16.2%
D63524
16.0%
E31488
 
8.0%
F11772
 
3.0%
G3054
 
0.8%

sub_grade
Categorical

HIGH CORRELATION

Distinct35
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
B3
 
26655
B4
 
25601
C1
 
23662
C2
 
22580
B2
 
22495
Other values (30)
275037 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters792060
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB4
2nd rowB5
3rd rowB3
4th rowA2
5th rowC5

Common Values

ValueCountFrequency (%)
B326655
 
6.7%
B425601
 
6.5%
C123662
 
6.0%
C222580
 
5.7%
B222495
 
5.7%
B522085
 
5.6%
C321221
 
5.4%
C420280
 
5.1%
B119182
 
4.8%
A518526
 
4.7%
Other values (25)173743
43.9%

Length

2022-11-29T23:09:19.585133image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b326655
 
6.7%
b425601
 
6.5%
c123662
 
6.0%
c222580
 
5.7%
b222495
 
5.7%
b522085
 
5.6%
c321221
 
5.4%
c420280
 
5.1%
b119182
 
4.8%
a518526
 
4.7%
Other values (25)173743
43.9%

Most occurring characters

ValueCountFrequency (%)
B116018
14.6%
C105987
13.4%
181077
10.2%
480849
10.2%
379720
10.1%
279544
10.0%
574840
9.4%
A64187
8.1%
D63524
8.0%
E31488
 
4.0%
Other values (2)14826
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter396030
50.0%
Decimal Number396030
50.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B116018
29.3%
C105987
26.8%
A64187
16.2%
D63524
16.0%
E31488
 
8.0%
F11772
 
3.0%
G3054
 
0.8%
Decimal Number
ValueCountFrequency (%)
181077
20.5%
480849
20.4%
379720
20.1%
279544
20.1%
574840
18.9%

Most occurring scripts

ValueCountFrequency (%)
Latin396030
50.0%
Common396030
50.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
B116018
29.3%
C105987
26.8%
A64187
16.2%
D63524
16.0%
E31488
 
8.0%
F11772
 
3.0%
G3054
 
0.8%
Common
ValueCountFrequency (%)
181077
20.5%
480849
20.4%
379720
20.1%
279544
20.1%
574840
18.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII792060
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B116018
14.6%
C105987
13.4%
181077
10.2%
480849
10.2%
379720
10.1%
279544
10.0%
574840
9.4%
A64187
8.1%
D63524
8.0%
E31488
 
4.0%
Other values (2)14826
 
1.9%

emp_title
Categorical

HIGH CARDINALITY
MISSING

Distinct173105
Distinct (%)46.4%
Missing22927
Missing (%)5.8%
Memory size3.0 MiB
Teacher
 
4389
Manager
 
4250
Registered Nurse
 
1856
RN
 
1846
Supervisor
 
1830
Other values (173100)
358932 

Length

Max length78
Median length56
Mean length16.5867361
Min length1

Characters and Unicode

Total characters6188561
Distinct characters125
Distinct categories17 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique145247 ?
Unique (%)38.9%

Sample

1st rowMarketing
2nd rowCredit analyst
3rd rowStatistician
4th rowClient Advocate
5th rowDestiny Management Inc.

Common Values

ValueCountFrequency (%)
Teacher4389
 
1.1%
Manager4250
 
1.1%
Registered Nurse1856
 
0.5%
RN1846
 
0.5%
Supervisor1830
 
0.5%
Sales1638
 
0.4%
Project Manager1505
 
0.4%
Owner1410
 
0.4%
Driver1339
 
0.3%
Office Manager1218
 
0.3%
Other values (173095)351822
88.8%
(Missing)22927
 
5.8%

Length

2022-11-29T23:09:19.707463image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
manager39270
 
4.7%
of15802
 
1.9%
inc10469
 
1.2%
director9837
 
1.2%
sales9635
 
1.1%
assistant9259
 
1.1%
analyst7652
 
0.9%
specialist7627
 
0.9%
supervisor7501
 
0.9%
engineer7462
 
0.9%
Other values (55359)717784
85.2%

Most occurring characters

ValueCountFrequency (%)
e606206
 
9.8%
487836
 
7.9%
r470449
 
7.6%
a455384
 
7.4%
i406094
 
6.6%
n405205
 
6.5%
t373457
 
6.0%
o330975
 
5.3%
s293945
 
4.7%
c244175
 
3.9%
Other values (115)2114835
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4701345
76.0%
Uppercase Letter939905
 
15.2%
Space Separator487839
 
7.9%
Other Punctuation45687
 
0.7%
Decimal Number6383
 
0.1%
Dash Punctuation5541
 
0.1%
Open Punctuation847
 
< 0.1%
Close Punctuation821
 
< 0.1%
Math Symbol114
 
< 0.1%
Control30
 
< 0.1%
Other values (7)49
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e606206
12.9%
r470449
10.0%
a455384
9.7%
i406094
8.6%
n405205
8.6%
t373457
7.9%
o330975
 
7.0%
s293945
 
6.3%
c244175
 
5.2%
l200056
 
4.3%
Other values (23)915399
19.5%
Uppercase Letter
ValueCountFrequency (%)
S115480
12.3%
C93703
 
10.0%
A87717
 
9.3%
M73150
 
7.8%
P57688
 
6.1%
T54371
 
5.8%
E50241
 
5.3%
I49183
 
5.2%
R48438
 
5.2%
D45784
 
4.9%
Other values (20)264150
28.1%
Other Punctuation
ValueCountFrequency (%)
.18744
41.0%
,9790
21.4%
/7921
17.3%
&6352
 
13.9%
'2523
 
5.5%
#140
 
0.3%
;45
 
0.1%
:43
 
0.1%
!31
 
0.1%
"28
 
0.1%
Other values (7)70
 
0.2%
Control
ValueCountFrequency (%)
€8
26.7%
ƒ7
23.3%
™3
 
10.0%
2
 
6.7%
š2
 
6.7%
’2
 
6.7%
‚2
 
6.7%
œ1
 
3.3%
†1
 
3.3%
…1
 
3.3%
Decimal Number
ValueCountFrequency (%)
11428
22.4%
21303
20.4%
31002
15.7%
4548
 
8.6%
0435
 
6.8%
5409
 
6.4%
6385
 
6.0%
9335
 
5.2%
7321
 
5.0%
8217
 
3.4%
Math Symbol
ValueCountFrequency (%)
+92
80.7%
|16
 
14.0%
~4
 
3.5%
¬1
 
0.9%
<1
 
0.9%
Open Punctuation
ValueCountFrequency (%)
(839
99.1%
[7
 
0.8%
{1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
)816
99.4%
]4
 
0.5%
}1
 
0.1%
Space Separator
ValueCountFrequency (%)
487836
> 99.9%
 3
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$8
72.7%
¢3
 
27.3%
Other Number
ValueCountFrequency (%)
²3
75.0%
³1
 
25.0%
Format
ValueCountFrequency (%)
­1
50.0%
1
50.0%
Dash Punctuation
ValueCountFrequency (%)
-5541
100.0%
Modifier Symbol
ValueCountFrequency (%)
`18
100.0%
Other Symbol
ValueCountFrequency (%)
©7
100.0%
Connector Punctuation
ValueCountFrequency (%)
_6
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5641250
91.2%
Common547311
 
8.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e606206
 
10.7%
r470449
 
8.3%
a455384
 
8.1%
i406094
 
7.2%
n405205
 
7.2%
t373457
 
6.6%
o330975
 
5.9%
s293945
 
5.2%
c244175
 
4.3%
l200056
 
3.5%
Other values (53)1855304
32.9%
Common
ValueCountFrequency (%)
487836
89.1%
.18744
 
3.4%
,9790
 
1.8%
/7921
 
1.4%
&6352
 
1.2%
-5541
 
1.0%
'2523
 
0.5%
11428
 
0.3%
21303
 
0.2%
31002
 
0.2%
Other values (52)4871
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII6188456
> 99.9%
None103
 
< 0.1%
Punctuation2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e606206
 
9.8%
487836
 
7.9%
r470449
 
7.6%
a455384
 
7.4%
i406094
 
6.6%
n405205
 
6.5%
t373457
 
6.0%
o330975
 
5.3%
s293945
 
4.7%
c244175
 
3.9%
Other values (83)2114730
34.2%
None
ValueCountFrequency (%)
Ã21
20.4%
Â10
 
9.7%
€8
 
7.8%
â8
 
7.8%
ƒ7
 
6.8%
©7
 
6.8%
é4
 
3.9%
²3
 
2.9%
™3
 
2.9%
¢3
 
2.9%
Other values (20)29
28.2%
Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%

emp_length
Categorical

MISSING

Distinct11
Distinct (%)< 0.1%
Missing18301
Missing (%)4.6%
Memory size3.0 MiB
10+ years
126041 
2 years
35827 
< 1 year
31725 
3 years
31665 
5 years
26495 
Other values (6)
125976 

Length

Max length9
Median length7
Mean length7.682830813
Min length6

Characters and Unicode

Total characters2902028
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row10+ years
2nd row4 years
3rd row< 1 year
4th row6 years
5th row9 years

Common Values

ValueCountFrequency (%)
10+ years126041
31.8%
2 years35827
 
9.0%
< 1 year31725
 
8.0%
3 years31665
 
8.0%
5 years26495
 
6.7%
1 year25882
 
6.5%
4 years23952
 
6.0%
6 years20841
 
5.3%
7 years20819
 
5.3%
8 years19168
 
4.8%
(Missing)18301
 
4.6%

Length

2022-11-29T23:09:19.828502image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
years320122
40.7%
10126041
 
16.0%
157607
 
7.3%
year57607
 
7.3%
235827
 
4.6%
31725
 
4.0%
331665
 
4.0%
526495
 
3.4%
423952
 
3.0%
620841
 
2.6%
Other values (3)55301
 
7.0%

Most occurring characters

ValueCountFrequency (%)
409454
14.1%
y377729
13.0%
e377729
13.0%
a377729
13.0%
r377729
13.0%
s320122
11.0%
1183648
6.3%
0126041
 
4.3%
+126041
 
4.3%
235827
 
1.2%
Other values (8)189979
6.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1831038
63.1%
Decimal Number503770
 
17.4%
Space Separator409454
 
14.1%
Math Symbol157766
 
5.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1183648
36.5%
0126041
25.0%
235827
 
7.1%
331665
 
6.3%
526495
 
5.3%
423952
 
4.8%
620841
 
4.1%
720819
 
4.1%
819168
 
3.8%
915314
 
3.0%
Lowercase Letter
ValueCountFrequency (%)
y377729
20.6%
e377729
20.6%
a377729
20.6%
r377729
20.6%
s320122
17.5%
Math Symbol
ValueCountFrequency (%)
+126041
79.9%
<31725
 
20.1%
Space Separator
ValueCountFrequency (%)
409454
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1831038
63.1%
Common1070990
36.9%

Most frequent character per script

Common
ValueCountFrequency (%)
409454
38.2%
1183648
17.1%
0126041
 
11.8%
+126041
 
11.8%
235827
 
3.3%
<31725
 
3.0%
331665
 
3.0%
526495
 
2.5%
423952
 
2.2%
620841
 
1.9%
Other values (3)55301
 
5.2%
Latin
ValueCountFrequency (%)
y377729
20.6%
e377729
20.6%
a377729
20.6%
r377729
20.6%
s320122
17.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII2902028
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
409454
14.1%
y377729
13.0%
e377729
13.0%
a377729
13.0%
r377729
13.0%
s320122
11.0%
1183648
6.3%
0126041
 
4.3%
+126041
 
4.3%
235827
 
1.2%
Other values (8)189979
6.5%

home_ownership
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
MORTGAGE
198348 
RENT
159790 
OWN
37746 
OTHER
 
112
NONE
 
31

Length

Max length8
Median length8
Mean length5.908327652
Min length3

Characters and Unicode

Total characters2339875
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRENT
2nd rowMORTGAGE
3rd rowRENT
4th rowRENT
5th rowMORTGAGE

Common Values

ValueCountFrequency (%)
MORTGAGE198348
50.1%
RENT159790
40.3%
OWN37746
 
9.5%
OTHER112
 
< 0.1%
NONE31
 
< 0.1%
ANY3
 
< 0.1%

Length

2022-11-29T23:09:19.927700image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-29T23:09:20.034609image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
mortgage198348
50.1%
rent159790
40.3%
own37746
 
9.5%
other112
 
< 0.1%
none31
 
< 0.1%
any3
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
G396696
17.0%
E358281
15.3%
R358250
15.3%
T358250
15.3%
O236237
10.1%
A198351
8.5%
M198348
8.5%
N197601
8.4%
W37746
 
1.6%
H112
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2339875
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
G396696
17.0%
E358281
15.3%
R358250
15.3%
T358250
15.3%
O236237
10.1%
A198351
8.5%
M198348
8.5%
N197601
8.4%
W37746
 
1.6%
H112
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin2339875
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G396696
17.0%
E358281
15.3%
R358250
15.3%
T358250
15.3%
O236237
10.1%
A198351
8.5%
M198348
8.5%
N197601
8.4%
W37746
 
1.6%
H112
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2339875
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G396696
17.0%
E358281
15.3%
R358250
15.3%
T358250
15.3%
O236237
10.1%
A198351
8.5%
M198348
8.5%
N197601
8.4%
W37746
 
1.6%
H112
 
< 0.1%

annual_inc
Real number (ℝ≥0)

SKEWED

Distinct27197
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74203.1758
Minimum0
Maximum8706582
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:20.142451image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile28000
Q145000
median64000
Q390000
95-th percentile150000
Maximum8706582
Range8706582
Interquartile range (IQR)45000

Descriptive statistics

Standard deviation61637.62116
Coefficient of variation (CV)0.8306601503
Kurtosis4238.550572
Mean74203.1758
Median Absolute Deviation (MAD)21000
Skewness41.04272475
Sum2.938668371 × 1010
Variance3799196342
MonotonicityNot monotonic
2022-11-29T23:09:20.240947image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6000015313
 
3.9%
5000013303
 
3.4%
6500011333
 
2.9%
7000010674
 
2.7%
4000010629
 
2.7%
4500010114
 
2.6%
800009971
 
2.5%
750009850
 
2.5%
550009195
 
2.3%
900007573
 
1.9%
Other values (27187)288075
72.7%
ValueCountFrequency (%)
01
 
< 0.1%
6001
 
< 0.1%
25001
 
< 0.1%
40002
 
< 0.1%
40801
 
< 0.1%
42001
 
< 0.1%
45241
 
< 0.1%
48006
< 0.1%
48881
 
< 0.1%
50003
< 0.1%
ValueCountFrequency (%)
87065821
< 0.1%
76000001
< 0.1%
74463951
< 0.1%
71417781
< 0.1%
70000001
< 0.1%
65000001
< 0.1%
61000001
< 0.1%
60000002
< 0.1%
50000001
< 0.1%
49000001
< 0.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Verified
139563 
Source Verified
131385 
Not Verified
125082 

Length

Max length15
Median length12
Mean length11.58564503
Min length8

Characters and Unicode

Total characters4588263
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot Verified
2nd rowNot Verified
3rd rowSource Verified
4th rowNot Verified
5th rowVerified

Common Values

ValueCountFrequency (%)
Verified139563
35.2%
Source Verified131385
33.2%
Not Verified125082
31.6%

Length

2022-11-29T23:09:20.347687image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-29T23:09:20.447311image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
verified396030
60.7%
source131385
 
20.1%
not125082
 
19.2%

Most occurring characters

ValueCountFrequency (%)
e923445
20.1%
i792060
17.3%
r527415
11.5%
V396030
8.6%
f396030
8.6%
d396030
8.6%
o256467
 
5.6%
256467
 
5.6%
S131385
 
2.9%
u131385
 
2.9%
Other values (3)381549
8.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3679299
80.2%
Uppercase Letter652497
 
14.2%
Space Separator256467
 
5.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e923445
25.1%
i792060
21.5%
r527415
14.3%
f396030
10.8%
d396030
10.8%
o256467
 
7.0%
u131385
 
3.6%
c131385
 
3.6%
t125082
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
V396030
60.7%
S131385
 
20.1%
N125082
 
19.2%
Space Separator
ValueCountFrequency (%)
256467
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4331796
94.4%
Common256467
 
5.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e923445
21.3%
i792060
18.3%
r527415
12.2%
V396030
9.1%
f396030
9.1%
d396030
9.1%
o256467
 
5.9%
S131385
 
3.0%
u131385
 
3.0%
c131385
 
3.0%
Other values (2)250164
 
5.8%
Common
ValueCountFrequency (%)
256467
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4588263
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e923445
20.1%
i792060
17.3%
r527415
11.5%
V396030
8.6%
f396030
8.6%
d396030
8.6%
o256467
 
5.6%
256467
 
5.6%
S131385
 
2.9%
u131385
 
2.9%
Other values (3)381549
8.3%

issue_d
Categorical

HIGH CARDINALITY

Distinct115
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Oct-2014
 
14846
Jul-2014
 
12609
Jan-2015
 
11705
Dec-2013
 
10618
Nov-2013
 
10496
Other values (110)
335756 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters3168240
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowJan-2015
2nd rowJan-2015
3rd rowJan-2015
4th rowNov-2014
5th rowApr-2013

Common Values

ValueCountFrequency (%)
Oct-201414846
 
3.7%
Jul-201412609
 
3.2%
Jan-201511705
 
3.0%
Dec-201310618
 
2.7%
Nov-201310496
 
2.7%
Jul-201510270
 
2.6%
Oct-201310047
 
2.5%
Jan-20149705
 
2.5%
Apr-20159470
 
2.4%
Sep-20139179
 
2.3%
Other values (105)287085
72.5%

Length

2022-11-29T23:09:20.532644image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
oct-201414846
 
3.7%
jul-201412609
 
3.2%
jan-201511705
 
3.0%
dec-201310618
 
2.7%
nov-201310496
 
2.7%
jul-201510270
 
2.6%
oct-201310047
 
2.5%
jan-20149705
 
2.5%
apr-20159470
 
2.4%
sep-20139179
 
2.3%
Other values (105)287085
72.5%

Most occurring characters

ValueCountFrequency (%)
2437232
13.8%
0410549
13.0%
1408204
12.9%
-396030
12.5%
J104536
 
3.3%
4102860
 
3.2%
u102670
 
3.2%
a98496
 
3.1%
397662
 
3.1%
594264
 
3.0%
Other values (23)915737
28.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1584120
50.0%
Lowercase Letter792060
25.0%
Dash Punctuation396030
 
12.5%
Uppercase Letter396030
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u102670
13.0%
a98496
12.4%
e85443
10.8%
c71212
9.0%
r65142
8.2%
n64822
8.2%
p60842
7.7%
t42130
 
5.3%
l39714
 
5.0%
o34068
 
4.3%
Other values (4)127521
16.1%
Decimal Number
ValueCountFrequency (%)
2437232
27.6%
0410549
25.9%
1408204
25.8%
4102860
 
6.5%
397662
 
6.2%
594264
 
6.0%
628088
 
1.8%
93826
 
0.2%
81240
 
0.1%
7195
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
J104536
26.4%
A66039
16.7%
M63814
16.1%
O42130
10.6%
N34068
 
8.6%
D29082
 
7.3%
F28742
 
7.3%
S27619
 
7.0%
Dash Punctuation
ValueCountFrequency (%)
-396030
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1980150
62.5%
Latin1188090
37.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
J104536
 
8.8%
u102670
 
8.6%
a98496
 
8.3%
e85443
 
7.2%
c71212
 
6.0%
A66039
 
5.6%
r65142
 
5.5%
n64822
 
5.5%
M63814
 
5.4%
p60842
 
5.1%
Other values (12)405074
34.1%
Common
ValueCountFrequency (%)
2437232
22.1%
0410549
20.7%
1408204
20.6%
-396030
20.0%
4102860
 
5.2%
397662
 
4.9%
594264
 
4.8%
628088
 
1.4%
93826
 
0.2%
81240
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3168240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2437232
13.8%
0410549
13.0%
1408204
12.9%
-396030
12.5%
J104536
 
3.3%
4102860
 
3.2%
u102670
 
3.2%
a98496
 
3.1%
397662
 
3.1%
594264
 
3.0%
Other values (23)915737
28.9%

loan_status
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Fully Paid
318357 
Charged Off
77673 

Length

Max length11
Median length10
Mean length10.19612908
Min length10

Characters and Unicode

Total characters4037973
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFully Paid
2nd rowFully Paid
3rd rowFully Paid
4th rowFully Paid
5th rowCharged Off

Common Values

ValueCountFrequency (%)
Fully Paid318357
80.4%
Charged Off77673
 
19.6%

Length

2022-11-29T23:09:20.620796image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-29T23:09:20.707721image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
fully318357
40.2%
paid318357
40.2%
charged77673
 
9.8%
off77673
 
9.8%

Most occurring characters

ValueCountFrequency (%)
l636714
15.8%
396030
9.8%
a396030
9.8%
d396030
9.8%
F318357
7.9%
u318357
7.9%
y318357
7.9%
P318357
7.9%
i318357
7.9%
f155346
 
3.8%
Other values (6)466038
11.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2849883
70.6%
Uppercase Letter792060
 
19.6%
Space Separator396030
 
9.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l636714
22.3%
a396030
13.9%
d396030
13.9%
u318357
11.2%
y318357
11.2%
i318357
11.2%
f155346
 
5.5%
h77673
 
2.7%
r77673
 
2.7%
g77673
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
F318357
40.2%
P318357
40.2%
C77673
 
9.8%
O77673
 
9.8%
Space Separator
ValueCountFrequency (%)
396030
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3641943
90.2%
Common396030
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
l636714
17.5%
a396030
10.9%
d396030
10.9%
F318357
8.7%
u318357
8.7%
y318357
8.7%
P318357
8.7%
i318357
8.7%
f155346
 
4.3%
C77673
 
2.1%
Other values (5)388365
10.7%
Common
ValueCountFrequency (%)
396030
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4037973
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l636714
15.8%
396030
9.8%
a396030
9.8%
d396030
9.8%
F318357
7.9%
u318357
7.9%
y318357
7.9%
P318357
7.9%
i318357
7.9%
f155346
 
3.8%
Other values (6)466038
11.5%

purpose
Categorical

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
debt_consolidation
234507 
credit_card
83019 
home_improvement
24030 
other
 
21185
major_purchase
 
8790
Other values (9)
24499 

Length

Max length18
Median length18
Mean length14.99784612
Min length3

Characters and Unicode

Total characters5939597
Distinct characters22
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowvacation
2nd rowdebt_consolidation
3rd rowcredit_card
4th rowcredit_card
5th rowcredit_card

Common Values

ValueCountFrequency (%)
debt_consolidation234507
59.2%
credit_card83019
 
21.0%
home_improvement24030
 
6.1%
other21185
 
5.3%
major_purchase8790
 
2.2%
small_business5701
 
1.4%
car4697
 
1.2%
medical4196
 
1.1%
moving2854
 
0.7%
vacation2452
 
0.6%
Other values (4)4599
 
1.2%

Length

2022-11-29T23:09:20.792714image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
debt_consolidation234507
59.2%
credit_card83019
 
21.0%
home_improvement24030
 
6.1%
other21185
 
5.3%
major_purchase8790
 
2.2%
small_business5701
 
1.4%
car4697
 
1.2%
medical4196
 
1.1%
moving2854
 
0.7%
vacation2452
 
0.6%
Other values (4)4599
 
1.2%

Most occurring characters

ValueCountFrequency (%)
o789320
13.3%
d643129
10.8%
t599957
10.1%
i593335
10.0%
n506778
8.5%
e435403
7.3%
c420937
7.1%
_356376
 
6.0%
a355447
 
6.0%
s268302
 
4.5%
Other values (12)970613
16.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5583221
94.0%
Connector Punctuation356376
 
6.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o789320
14.1%
d643129
11.5%
t599957
10.7%
i593335
10.6%
n506778
9.1%
e435403
7.8%
c420937
7.5%
a355447
6.4%
s268302
 
4.8%
l250691
 
4.5%
Other values (11)719922
12.9%
Connector Punctuation
ValueCountFrequency (%)
_356376
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5583221
94.0%
Common356376
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o789320
14.1%
d643129
11.5%
t599957
10.7%
i593335
10.6%
n506778
9.1%
e435403
7.8%
c420937
7.5%
a355447
6.4%
s268302
 
4.8%
l250691
 
4.5%
Other values (11)719922
12.9%
Common
ValueCountFrequency (%)
_356376
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII5939597
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o789320
13.3%
d643129
10.8%
t599957
10.1%
i593335
10.0%
n506778
8.5%
e435403
7.3%
c420937
7.1%
_356376
 
6.0%
a355447
 
6.0%
s268302
 
4.5%
Other values (12)970613
16.3%

title
Categorical

HIGH CARDINALITY

Distinct48817
Distinct (%)12.4%
Missing1755
Missing (%)0.4%
Memory size3.0 MiB
Debt consolidation
152472 
Credit card refinancing
51487 
Home improvement
15264 
Other
 
12930
Debt Consolidation
 
11608
Other values (48812)
150514 

Length

Max length80
Median length79
Mean length17.24109315
Min length2

Characters and Unicode

Total characters6797732
Distinct characters101
Distinct categories15 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41798 ?
Unique (%)10.6%

Sample

1st rowVacation
2nd rowDebt consolidation
3rd rowCredit card refinancing
4th rowCredit card refinancing
5th rowCredit Card Refinance

Common Values

ValueCountFrequency (%)
Debt consolidation152472
38.5%
Credit card refinancing51487
 
13.0%
Home improvement15264
 
3.9%
Other12930
 
3.3%
Debt Consolidation11608
 
2.9%
Major purchase4769
 
1.2%
Consolidation3852
 
1.0%
debt consolidation3547
 
0.9%
Business2949
 
0.7%
Debt Consolidation Loan2864
 
0.7%
Other values (48807)132533
33.5%

Length

2022-11-29T23:09:20.917335image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
consolidation191014
21.7%
debt190821
21.6%
credit74290
 
8.4%
card68254
 
7.7%
refinancing52262
 
5.9%
loan28112
 
3.2%
home22625
 
2.6%
improvement18786
 
2.1%
other13252
 
1.5%
payoff6685
 
0.8%
Other values (14633)216174
24.5%

Most occurring characters

ValueCountFrequency (%)
o735791
10.8%
n682851
 
10.0%
i655694
 
9.6%
t545268
 
8.0%
e521004
 
7.7%
494561
 
7.3%
a445747
 
6.6%
d386101
 
5.7%
c322828
 
4.7%
r295630
 
4.3%
Other values (91)1712257
25.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5749399
84.6%
Uppercase Letter527589
 
7.8%
Space Separator494561
 
7.3%
Decimal Number13723
 
0.2%
Other Punctuation9147
 
0.1%
Dash Punctuation1929
 
< 0.1%
Connector Punctuation663
 
< 0.1%
Close Punctuation209
 
< 0.1%
Currency Symbol178
 
< 0.1%
Open Punctuation163
 
< 0.1%
Other values (5)171
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o735791
12.8%
n682851
11.9%
i655694
11.4%
t545268
9.5%
e521004
9.1%
a445747
7.8%
d386101
6.7%
c322828
 
5.6%
r295630
 
5.1%
s262921
 
4.6%
Other values (17)895564
15.6%
Uppercase Letter
ValueCountFrequency (%)
D187930
35.6%
C131422
24.9%
L26622
 
5.0%
H24603
 
4.7%
O23248
 
4.4%
P18167
 
3.4%
M17673
 
3.3%
R13527
 
2.6%
B11236
 
2.1%
I9867
 
1.9%
Other values (17)63294
 
12.0%
Other Punctuation
ValueCountFrequency (%)
!2474
27.0%
/1738
19.0%
.1647
18.0%
'1040
11.4%
,848
 
9.3%
&778
 
8.5%
%143
 
1.6%
#132
 
1.4%
:125
 
1.4%
"108
 
1.2%
Other values (5)114
 
1.2%
Decimal Number
ValueCountFrequency (%)
14028
29.4%
23446
25.1%
03203
23.3%
31306
 
9.5%
4387
 
2.8%
5364
 
2.7%
9317
 
2.3%
6291
 
2.1%
7193
 
1.4%
8188
 
1.4%
Math Symbol
ValueCountFrequency (%)
+103
68.2%
=17
 
11.3%
~9
 
6.0%
<9
 
6.0%
>8
 
5.3%
|5
 
3.3%
Control
ValueCountFrequency (%)
11
73.3%
€2
 
13.3%
™1
 
6.7%
…1
 
6.7%
Close Punctuation
ValueCountFrequency (%)
)205
98.1%
]4
 
1.9%
Open Punctuation
ValueCountFrequency (%)
(158
96.9%
[5
 
3.1%
Modifier Symbol
ValueCountFrequency (%)
`2
66.7%
^1
33.3%
Space Separator
ValueCountFrequency (%)
494561
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1929
100.0%
Connector Punctuation
ValueCountFrequency (%)
_663
100.0%
Currency Symbol
ValueCountFrequency (%)
$178
100.0%
Other Number
ValueCountFrequency (%)
³1
100.0%
Other Symbol
ValueCountFrequency (%)
¦1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6276988
92.3%
Common520744
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
o735791
11.7%
n682851
10.9%
i655694
10.4%
t545268
 
8.7%
e521004
 
8.3%
a445747
 
7.1%
d386101
 
6.2%
c322828
 
5.1%
r295630
 
4.7%
s262921
 
4.2%
Other values (44)1423153
22.7%
Common
ValueCountFrequency (%)
494561
95.0%
14028
 
0.8%
23446
 
0.7%
03203
 
0.6%
!2474
 
0.5%
-1929
 
0.4%
/1738
 
0.3%
.1647
 
0.3%
31306
 
0.3%
'1040
 
0.2%
Other values (37)5372
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6797723
> 99.9%
None9
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o735791
10.8%
n682851
 
10.0%
i655694
 
9.6%
t545268
 
8.0%
e521004
 
7.7%
494561
 
7.3%
a445747
 
6.6%
d386101
 
5.7%
c322828
 
4.7%
r295630
 
4.3%
Other values (84)1712248
25.2%
None
ValueCountFrequency (%)
â2
22.2%
€2
22.2%
™1
11.1%
³1
11.1%
Ã1
11.1%
¦1
11.1%
…1
11.1%

dti
Real number (ℝ≥0)

SKEWED

Distinct4262
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.37951365
Minimum0
Maximum9999
Zeros313
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:21.057813image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4.68
Q111.28
median16.91
Q322.98
95-th percentile31.58
Maximum9999
Range9999
Interquartile range (IQR)11.7

Descriptive statistics

Standard deviation18.01909234
Coefficient of variation (CV)1.036800725
Kurtosis237923.6765
Mean17.37951365
Median Absolute Deviation (MAD)5.83
Skewness431.0512254
Sum6882808.79
Variance324.6876889
MonotonicityNot monotonic
2022-11-29T23:09:21.167656image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0313
 
0.1%
14.4310
 
0.1%
19.2302
 
0.1%
16.8301
 
0.1%
18300
 
0.1%
20.4296
 
0.1%
12293
 
0.1%
13.2291
 
0.1%
21.6270
 
0.1%
15.6266
 
0.1%
Other values (4252)393088
99.3%
ValueCountFrequency (%)
0313
0.1%
0.018
 
< 0.1%
0.0212
 
< 0.1%
0.035
 
< 0.1%
0.045
 
< 0.1%
0.056
 
< 0.1%
0.067
 
< 0.1%
0.077
 
< 0.1%
0.088
 
< 0.1%
0.094
 
< 0.1%
ValueCountFrequency (%)
99991
< 0.1%
16221
< 0.1%
380.531
< 0.1%
189.91
< 0.1%
145.651
< 0.1%
138.031
< 0.1%
120.661
< 0.1%
107.551
< 0.1%
93.861
< 0.1%
92.131
< 0.1%

earliest_cr_line
Categorical

HIGH CARDINALITY

Distinct684
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Oct-2000
 
3017
Aug-2000
 
2935
Oct-2001
 
2896
Aug-2001
 
2884
Nov-2000
 
2736
Other values (679)
381562 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters3168240
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45 ?
Unique (%)< 0.1%

Sample

1st rowJun-1990
2nd rowJul-2004
3rd rowAug-2007
4th rowSep-2006
5th rowMar-1999

Common Values

ValueCountFrequency (%)
Oct-20003017
 
0.8%
Aug-20002935
 
0.7%
Oct-20012896
 
0.7%
Aug-20012884
 
0.7%
Nov-20002736
 
0.7%
Oct-19992726
 
0.7%
Nov-19992700
 
0.7%
Sep-20002691
 
0.7%
Oct-20022640
 
0.7%
Aug-20022599
 
0.7%
Other values (674)368206
93.0%

Length

2022-11-29T23:09:21.270649image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
oct-20003017
 
0.8%
aug-20002935
 
0.7%
oct-20012896
 
0.7%
aug-20012884
 
0.7%
nov-20002736
 
0.7%
oct-19992726
 
0.7%
nov-19992700
 
0.7%
sep-20002691
 
0.7%
oct-20022640
 
0.7%
aug-20022599
 
0.7%
Other values (674)368206
93.0%

Most occurring characters

ValueCountFrequency (%)
0416557
13.1%
9402384
 
12.7%
-396030
 
12.5%
1253612
 
8.0%
2228260
 
7.2%
e100403
 
3.2%
u99766
 
3.1%
J93111
 
2.9%
a92756
 
2.9%
878297
 
2.5%
Other values (23)1007064
31.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1584120
50.0%
Lowercase Letter792060
25.0%
Dash Punctuation396030
 
12.5%
Uppercase Letter396030
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e100403
12.7%
u99766
12.6%
a92756
11.7%
c71978
9.1%
p66904
8.4%
n61139
7.7%
r60848
7.7%
t38291
 
4.8%
g37349
 
4.7%
v35583
 
4.5%
Other values (4)127043
16.0%
Decimal Number
ValueCountFrequency (%)
0416557
26.3%
9402384
25.4%
1253612
16.0%
2228260
14.4%
878297
 
4.9%
744922
 
2.8%
440809
 
2.6%
640321
 
2.5%
339568
 
2.5%
539390
 
2.5%
Uppercase Letter
ValueCountFrequency (%)
J93111
23.5%
A66580
16.8%
M62062
15.7%
O38291
9.7%
S37673
9.5%
N35583
 
9.0%
D33687
 
8.5%
F29043
 
7.3%
Dash Punctuation
ValueCountFrequency (%)
-396030
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1980150
62.5%
Latin1188090
37.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e100403
 
8.5%
u99766
 
8.4%
J93111
 
7.8%
a92756
 
7.8%
c71978
 
6.1%
p66904
 
5.6%
A66580
 
5.6%
M62062
 
5.2%
n61139
 
5.1%
r60848
 
5.1%
Other values (12)412543
34.7%
Common
ValueCountFrequency (%)
0416557
21.0%
9402384
20.3%
-396030
20.0%
1253612
12.8%
2228260
11.5%
878297
 
4.0%
744922
 
2.3%
440809
 
2.1%
640321
 
2.0%
339568
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3168240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0416557
13.1%
9402384
 
12.7%
-396030
 
12.5%
1253612
 
8.0%
2228260
 
7.2%
e100403
 
3.2%
u99766
 
3.1%
J93111
 
2.9%
a92756
 
2.9%
878297
 
2.5%
Other values (23)1007064
31.8%

open_acc
Real number (ℝ≥0)

HIGH CORRELATION

Distinct61
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.3111532
Minimum0
Maximum90
Zeros6
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:21.367657image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q18
median10
Q314
95-th percentile21
Maximum90
Range90
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.137648808
Coefficient of variation (CV)0.4542108766
Kurtosis2.966944774
Mean11.3111532
Median Absolute Deviation (MAD)3
Skewness1.213018844
Sum4479556
Variance26.39543527
MonotonicityNot monotonic
2022-11-29T23:09:21.482751image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
936779
 
9.3%
1035441
 
8.9%
835137
 
8.9%
1132695
 
8.3%
731328
 
7.9%
1229157
 
7.4%
625927
 
6.5%
1324983
 
6.3%
1421173
 
5.3%
518308
 
4.6%
Other values (51)105102
26.5%
ValueCountFrequency (%)
06
 
< 0.1%
185
 
< 0.1%
21459
 
0.4%
34783
 
1.2%
410709
 
2.7%
518308
4.6%
625927
6.5%
731328
7.9%
835137
8.9%
936779
9.3%
ValueCountFrequency (%)
901
 
< 0.1%
762
 
< 0.1%
581
 
< 0.1%
571
 
< 0.1%
562
 
< 0.1%
552
 
< 0.1%
543
< 0.1%
536
< 0.1%
523
< 0.1%
514
< 0.1%

pub_rec
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1781910461
Minimum0
Maximum86
Zeros338272
Zeros (%)85.4%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:21.585188image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum86
Range86
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5306706005
Coefficient of variation (CV)2.978099136
Kurtosis1867.466643
Mean0.1781910461
Median Absolute Deviation (MAD)0
Skewness16.5765642
Sum70569
Variance0.2816112862
MonotonicityNot monotonic
2022-11-29T23:09:21.667614image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
0338272
85.4%
149739
 
12.6%
25476
 
1.4%
31521
 
0.4%
4527
 
0.1%
5237
 
0.1%
6122
 
< 0.1%
756
 
< 0.1%
834
 
< 0.1%
912
 
< 0.1%
Other values (10)34
 
< 0.1%
ValueCountFrequency (%)
0338272
85.4%
149739
 
12.6%
25476
 
1.4%
31521
 
0.4%
4527
 
0.1%
5237
 
0.1%
6122
 
< 0.1%
756
 
< 0.1%
834
 
< 0.1%
912
 
< 0.1%
ValueCountFrequency (%)
861
 
< 0.1%
401
 
< 0.1%
241
 
< 0.1%
192
 
< 0.1%
171
 
< 0.1%
151
 
< 0.1%
134
 
< 0.1%
124
 
< 0.1%
118
< 0.1%
1011
< 0.1%

revol_bal
Real number (ℝ≥0)

Distinct55622
Distinct (%)14.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15844.53985
Minimum0
Maximum1743266
Zeros2128
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:21.777307image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1685
Q16025
median11181
Q319620
95-th percentile41066.55
Maximum1743266
Range1743266
Interquartile range (IQR)13595

Descriptive statistics

Standard deviation20591.83611
Coefficient of variation (CV)1.299617174
Kurtosis384.2210931
Mean15844.53985
Median Absolute Deviation (MAD)6112
Skewness11.72751512
Sum6274913118
Variance424023714.3
MonotonicityNot monotonic
2022-11-29T23:09:21.887500image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02128
 
0.5%
565541
 
< 0.1%
609538
 
< 0.1%
779238
 
< 0.1%
395337
 
< 0.1%
509836
 
< 0.1%
607736
 
< 0.1%
850235
 
< 0.1%
523535
 
< 0.1%
538935
 
< 0.1%
Other values (55612)393571
99.4%
ValueCountFrequency (%)
02128
0.5%
130
 
< 0.1%
226
 
< 0.1%
328
 
< 0.1%
420
 
< 0.1%
523
 
< 0.1%
630
 
< 0.1%
721
 
< 0.1%
821
 
< 0.1%
923
 
< 0.1%
ValueCountFrequency (%)
17432661
< 0.1%
12987831
< 0.1%
11900461
< 0.1%
10308261
< 0.1%
10239401
< 0.1%
9758001
< 0.1%
8675281
< 0.1%
8386981
< 0.1%
8143001
< 0.1%
7786141
< 0.1%

revol_util
Real number (ℝ≥0)

Distinct1226
Distinct (%)0.3%
Missing276
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean53.79174864
Minimum0
Maximum892.3
Zeros2213
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:22.017589image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11.2
Q135.8
median54.8
Q372.9
95-th percentile92
Maximum892.3
Range892.3
Interquartile range (IQR)37.1

Descriptive statistics

Standard deviation24.45219306
Coefficient of variation (CV)0.4545714479
Kurtosis2.71227821
Mean53.79174864
Median Absolute Deviation (MAD)18.5
Skewness-0.07177802033
Sum21288299.69
Variance597.9097456
MonotonicityNot monotonic
2022-11-29T23:09:22.122851image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02213
 
0.6%
53752
 
0.2%
60739
 
0.2%
61734
 
0.2%
55730
 
0.2%
54725
 
0.2%
62721
 
0.2%
47720
 
0.2%
57719
 
0.2%
58717
 
0.2%
Other values (1216)386984
97.7%
ValueCountFrequency (%)
02213
0.6%
0.011
 
< 0.1%
0.041
 
< 0.1%
0.051
 
< 0.1%
0.1253
 
0.1%
0.161
 
< 0.1%
0.2211
 
0.1%
0.3187
 
< 0.1%
0.4189
 
< 0.1%
0.461
 
< 0.1%
ValueCountFrequency (%)
892.31
< 0.1%
1531
< 0.1%
152.51
< 0.1%
150.71
< 0.1%
1481
< 0.1%
146.11
< 0.1%
145.81
< 0.1%
140.41
< 0.1%
136.71
< 0.1%
132.11
< 0.1%

total_acc
Real number (ℝ≥0)

HIGH CORRELATION

Distinct118
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.41474383
Minimum2
Maximum151
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:22.237370image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile9
Q117
median24
Q332
95-th percentile47
Maximum151
Range149
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.88699072
Coefficient of variation (CV)0.4677202651
Kurtosis1.204620014
Mean25.41474383
Median Absolute Deviation (MAD)8
Skewness0.8643276369
Sum10065001
Variance141.3005485
MonotonicityNot monotonic
2022-11-29T23:09:22.347510image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2114280
 
3.6%
2214260
 
3.6%
2014228
 
3.6%
2313923
 
3.5%
2413878
 
3.5%
1913876
 
3.5%
1813710
 
3.5%
1713495
 
3.4%
2513225
 
3.3%
2612799
 
3.2%
Other values (108)258356
65.2%
ValueCountFrequency (%)
218
 
< 0.1%
3327
 
0.1%
41238
 
0.3%
52028
 
0.5%
62923
 
0.7%
74143
1.0%
85365
1.4%
96362
1.6%
107672
1.9%
118844
2.2%
ValueCountFrequency (%)
1511
< 0.1%
1501
< 0.1%
1351
< 0.1%
1291
< 0.1%
1241
< 0.1%
1181
< 0.1%
1171
< 0.1%
1162
< 0.1%
1151
< 0.1%
1112
< 0.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
f
238066 
w
157964 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters396030
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st roww
2nd rowf
3rd rowf
4th rowf
5th rowf

Common Values

ValueCountFrequency (%)
f238066
60.1%
w157964
39.9%

Length

2022-11-29T23:09:22.462576image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-29T23:09:22.557204image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
f238066
60.1%
w157964
39.9%

Most occurring characters

ValueCountFrequency (%)
f238066
60.1%
w157964
39.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter396030
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
f238066
60.1%
w157964
39.9%

Most occurring scripts

ValueCountFrequency (%)
Latin396030
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
f238066
60.1%
w157964
39.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII396030
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
f238066
60.1%
w157964
39.9%

application_type
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
INDIVIDUAL
395319 
JOINT
 
425
DIRECT_PAY
 
286

Length

Max length10
Median length10
Mean length9.994634245
Min length5

Characters and Unicode

Total characters3958175
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowINDIVIDUAL
2nd rowINDIVIDUAL
3rd rowINDIVIDUAL
4th rowINDIVIDUAL
5th rowINDIVIDUAL

Common Values

ValueCountFrequency (%)
INDIVIDUAL395319
99.8%
JOINT425
 
0.1%
DIRECT_PAY286
 
0.1%

Length

2022-11-29T23:09:22.637917image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-29T23:09:22.737426image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
individual395319
99.8%
joint425
 
0.1%
direct_pay286
 
0.1%

Most occurring characters

ValueCountFrequency (%)
I1186668
30.0%
D790924
20.0%
N395744
 
10.0%
A395605
 
10.0%
V395319
 
10.0%
U395319
 
10.0%
L395319
 
10.0%
T711
 
< 0.1%
J425
 
< 0.1%
O425
 
< 0.1%
Other values (6)1716
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3957889
> 99.9%
Connector Punctuation286
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I1186668
30.0%
D790924
20.0%
N395744
 
10.0%
A395605
 
10.0%
V395319
 
10.0%
U395319
 
10.0%
L395319
 
10.0%
T711
 
< 0.1%
J425
 
< 0.1%
O425
 
< 0.1%
Other values (5)1430
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_286
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3957889
> 99.9%
Common286
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
I1186668
30.0%
D790924
20.0%
N395744
 
10.0%
A395605
 
10.0%
V395319
 
10.0%
U395319
 
10.0%
L395319
 
10.0%
T711
 
< 0.1%
J425
 
< 0.1%
O425
 
< 0.1%
Other values (5)1430
 
< 0.1%
Common
ValueCountFrequency (%)
_286
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3958175
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I1186668
30.0%
D790924
20.0%
N395744
 
10.0%
A395605
 
10.0%
V395319
 
10.0%
U395319
 
10.0%
L395319
 
10.0%
T711
 
< 0.1%
J425
 
< 0.1%
O425
 
< 0.1%
Other values (6)1716
 
< 0.1%

mort_acc
Real number (ℝ≥0)

MISSING
ZEROS

Distinct33
Distinct (%)< 0.1%
Missing37795
Missing (%)9.5%
Infinite0
Infinite (%)0.0%
Mean1.813990816
Minimum0
Maximum34
Zeros139777
Zeros (%)35.3%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:22.817420image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile6
Maximum34
Range34
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.147930467
Coefficient of variation (CV)1.184091148
Kurtosis4.477175726
Mean1.813990816
Median Absolute Deviation (MAD)1
Skewness1.600132438
Sum649835
Variance4.613605292
MonotonicityNot monotonic
2022-11-29T23:09:22.922740image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
0139777
35.3%
160416
15.3%
249948
 
12.6%
338049
 
9.6%
427887
 
7.0%
518194
 
4.6%
611069
 
2.8%
76052
 
1.5%
83121
 
0.8%
91656
 
0.4%
Other values (23)2066
 
0.5%
(Missing)37795
 
9.5%
ValueCountFrequency (%)
0139777
35.3%
160416
15.3%
249948
 
12.6%
338049
 
9.6%
427887
 
7.0%
518194
 
4.6%
611069
 
2.8%
76052
 
1.5%
83121
 
0.8%
91656
 
0.4%
ValueCountFrequency (%)
341
 
< 0.1%
322
 
< 0.1%
312
 
< 0.1%
301
 
< 0.1%
281
 
< 0.1%
273
 
< 0.1%
262
 
< 0.1%
254
 
< 0.1%
2410
< 0.1%
232
 
< 0.1%

pub_rec_bankruptcies
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct9
Distinct (%)< 0.1%
Missing535
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean0.1216475556
Minimum0
Maximum8
Zeros350380
Zeros (%)88.5%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2022-11-29T23:09:23.023213image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3561742766
Coefficient of variation (CV)2.927919718
Kurtosis18.10416044
Mean0.1216475556
Median Absolute Deviation (MAD)0
Skewness3.423440368
Sum48111
Variance0.1268601153
MonotonicityNot monotonic
2022-11-29T23:09:23.112313image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
0350380
88.5%
142790
 
10.8%
21847
 
0.5%
3351
 
0.1%
482
 
< 0.1%
532
 
< 0.1%
67
 
< 0.1%
74
 
< 0.1%
82
 
< 0.1%
(Missing)535
 
0.1%
ValueCountFrequency (%)
0350380
88.5%
142790
 
10.8%
21847
 
0.5%
3351
 
0.1%
482
 
< 0.1%
532
 
< 0.1%
67
 
< 0.1%
74
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
82
 
< 0.1%
74
 
< 0.1%
67
 
< 0.1%
532
 
< 0.1%
482
 
< 0.1%
3351
 
0.1%
21847
 
0.5%
142790
 
10.8%
0350380
88.5%

address
Categorical

HIGH CARDINALITY
UNIFORM

Distinct393700
Distinct (%)99.4%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
USCGC Smith FPO AE 70466
 
8
USS Johnson FPO AE 48052
 
8
USNS Johnson FPO AE 05113
 
8
USS Smith FPO AP 70466
 
8
USNS Johnson FPO AP 48052
 
7
Other values (393695)
395991 

Length

Max length69
Median length60
Mean length44.71395096
Min length20

Characters and Unicode

Total characters17708066
Distinct characters67
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique391984 ?
Unique (%)99.0%

Sample

1st row0174 Michelle Gateway Mendozaberg, OK 22690
2nd row1076 Carney Fort Apt. 347 Loganmouth, SD 05113
3rd row87025 Mark Dale Apt. 269 New Sabrina, WV 05113
4th row823 Reid Ford Delacruzside, MA 00813
5th row679 Luna Roads Greggshire, VA 11650

Common Values

ValueCountFrequency (%)
USCGC Smith FPO AE 704668
 
< 0.1%
USS Johnson FPO AE 480528
 
< 0.1%
USNS Johnson FPO AE 051138
 
< 0.1%
USS Smith FPO AP 704668
 
< 0.1%
USNS Johnson FPO AP 480527
 
< 0.1%
USNV Smith FPO AA 008136
 
< 0.1%
USCGC Smith FPO AA 704666
 
< 0.1%
USCGC Jones FPO AE 226906
 
< 0.1%
USNS Johnson FPO AA 704666
 
< 0.1%
USNV Smith FPO AE 307236
 
< 0.1%
Other values (393690)395961
> 99.9%

Length

2022-11-29T23:09:23.622523image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
suite88417
 
3.0%
apt88400
 
3.0%
7046656986
 
2.0%
3072356548
 
1.9%
2269056527
 
1.9%
4805255920
 
1.9%
0081345826
 
1.6%
2959745472
 
1.6%
0511345403
 
1.6%
box28349
 
1.0%
Other values (108604)2352838
80.6%

Most occurring characters

ValueCountFrequency (%)
2128626
 
12.0%
e911545
 
5.1%
a735427
 
4.2%
t702787
 
4.0%
r656748
 
3.7%
0624825
 
3.5%
i580043
 
3.3%
o579480
 
3.3%
n551350
 
3.1%
2487525
 
2.8%
Other values (57)9749710
55.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7690715
43.4%
Decimal Number4151920
23.4%
Uppercase Letter2488639
 
14.1%
Space Separator2128626
 
12.0%
Control792060
 
4.5%
Other Punctuation456106
 
2.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e911545
11.9%
a735427
9.6%
t702787
 
9.1%
r656748
 
8.5%
i580043
 
7.5%
o579480
 
7.5%
n551350
 
7.2%
s471608
 
6.1%
l400273
 
5.2%
h341828
 
4.4%
Other values (16)1759626
22.9%
Uppercase Letter
ValueCountFrequency (%)
A295950
 
11.9%
S274289
 
11.0%
P164644
 
6.6%
M161767
 
6.5%
C157893
 
6.3%
N148622
 
6.0%
D106785
 
4.3%
L105374
 
4.2%
W94264
 
3.8%
R93408
 
3.8%
Other values (16)885643
35.6%
Decimal Number
ValueCountFrequency (%)
0624825
15.0%
2487525
11.7%
3443992
10.7%
6421262
10.1%
7387522
9.3%
1375962
9.1%
9375452
9.0%
5375301
9.0%
4330279
8.0%
8329800
7.9%
Control
ValueCountFrequency (%)
396030
50.0%
396030
50.0%
Other Punctuation
ValueCountFrequency (%)
,367706
80.6%
.88400
 
19.4%
Space Separator
ValueCountFrequency (%)
2128626
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin10179354
57.5%
Common7528712
42.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e911545
 
9.0%
a735427
 
7.2%
t702787
 
6.9%
r656748
 
6.5%
i580043
 
5.7%
o579480
 
5.7%
n551350
 
5.4%
s471608
 
4.6%
l400273
 
3.9%
h341828
 
3.4%
Other values (42)4248265
41.7%
Common
ValueCountFrequency (%)
2128626
28.3%
0624825
 
8.3%
2487525
 
6.5%
3443992
 
5.9%
6421262
 
5.6%
396030
 
5.3%
396030
 
5.3%
7387522
 
5.1%
1375962
 
5.0%
9375452
 
5.0%
Other values (5)1491486
19.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII17708066
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2128626
 
12.0%
e911545
 
5.1%
a735427
 
4.2%
t702787
 
4.0%
r656748
 
3.7%
0624825
 
3.5%
i580043
 
3.3%
o579480
 
3.3%
n551350
 
3.1%
2487525
 
2.8%
Other values (57)9749710
55.1%

Interactions

2022-11-29T23:09:09.333647image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:35.632966image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:38.411401image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:42.954894image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:45.617918image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:48.723877image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:51.559359image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:54.917488image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:57.527815image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:00.729197image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:04.109712image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:06.837363image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:09.542249image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:35.879755image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:38.627830image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:43.188050image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:45.827811image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:49.067540image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:51.807487image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:55.136531image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:57.787988image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:01.021561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:04.367537image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:07.047760image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:09.738114image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:36.163048image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:40.008402image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:43.398152image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:46.037976image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:49.277634image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:52.027903image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:55.337490image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:58.017608image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:01.337689image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:04.607471image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:07.277515image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:09.937392image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:36.397784image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:40.857852image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:43.617627image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:46.257802image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:49.527661image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:52.273803image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:55.557684image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:58.267756image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:01.622617image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:04.837675image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:07.489883image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:10.127376image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:36.612906image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:41.107934image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:43.854402image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:46.459487image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:49.767724image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:52.517795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:55.763115image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:58.537873image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:01.997662image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:05.057741image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:07.687557image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:10.322914image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:36.828091image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:41.322797image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:44.068463image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:46.667579image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:49.987638image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:52.737954image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:55.967644image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:58.757810image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:02.422525image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:05.278476image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:07.897840image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:10.517890image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:37.058101image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:41.529637image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:44.288205image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:46.871127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:50.217772image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:52.969501image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:56.163571image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:59.009762image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:02.737783image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:05.493703image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:08.093700image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:10.702429image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:37.270707image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:41.738206image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:44.522825image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:47.077881image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:50.429776image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:53.240530image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:56.377602image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:59.237673image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:02.969316image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:05.707515image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:08.287893image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:10.917860image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:37.513030image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:42.012940image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:44.748003image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:47.303110image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:50.658484image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:53.499019image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:56.647593image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:59.487650image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:03.197468image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:05.949361image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:08.513801image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:11.107306image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:37.729869image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:42.227582image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:44.962770image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:47.531791image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:50.867820image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:53.917779image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:56.857716image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:59.966207image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:03.412760image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:06.167541image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:08.712728image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:11.312540image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:37.948338image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:42.494855image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:45.177925image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:47.798096image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:51.087863image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:54.266300image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:57.057525image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:00.196663image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:03.630070image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:06.389704image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:08.921146image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:11.517604image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:38.178279image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:42.732510image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:45.397871image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:48.331884image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:51.321668image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:54.662723image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:08:57.297811image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:00.448481image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:03.857355image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:06.617670image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-29T23:09:09.122906image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-11-29T23:09:23.727993image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-29T23:09:23.872306image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-29T23:09:24.021519image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-29T23:09:24.167462image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-29T23:09:24.320989image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-29T23:09:12.469470image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-29T23:09:14.499130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-29T23:09:16.527731image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-11-29T23:09:17.542795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

loan_amnttermint_rateinstallmentgradesub_gradeemp_titleemp_lengthhome_ownershipannual_incverification_statusissue_dloan_statuspurposetitledtiearliest_cr_lineopen_accpub_recrevol_balrevol_utiltotal_accinitial_list_statusapplication_typemort_accpub_rec_bankruptciesaddress
010000.036 months11.44329.48BB4Marketing10+ yearsRENT117000.0Not VerifiedJan-2015Fully PaidvacationVacation26.24Jun-199016.00.036369.041.825.0wINDIVIDUAL0.00.00174 Michelle Gateway\r\nMendozaberg, OK 22690
18000.036 months11.99265.68BB5Credit analyst4 yearsMORTGAGE65000.0Not VerifiedJan-2015Fully Paiddebt_consolidationDebt consolidation22.05Jul-200417.00.020131.053.327.0fINDIVIDUAL3.00.01076 Carney Fort Apt. 347\r\nLoganmouth, SD 05113
215600.036 months10.49506.97BB3Statistician< 1 yearRENT43057.0Source VerifiedJan-2015Fully Paidcredit_cardCredit card refinancing12.79Aug-200713.00.011987.092.226.0fINDIVIDUAL0.00.087025 Mark Dale Apt. 269\r\nNew Sabrina, WV 05113
37200.036 months6.49220.65AA2Client Advocate6 yearsRENT54000.0Not VerifiedNov-2014Fully Paidcredit_cardCredit card refinancing2.60Sep-20066.00.05472.021.513.0fINDIVIDUAL0.00.0823 Reid Ford\r\nDelacruzside, MA 00813
424375.060 months17.27609.33CC5Destiny Management Inc.9 yearsMORTGAGE55000.0VerifiedApr-2013Charged Offcredit_cardCredit Card Refinance33.95Mar-199913.00.024584.069.843.0fINDIVIDUAL1.00.0679 Luna Roads\r\nGreggshire, VA 11650
520000.036 months13.33677.07CC3HR Specialist10+ yearsMORTGAGE86788.0VerifiedSep-2015Fully Paiddebt_consolidationDebt consolidation16.31Jan-20058.00.025757.0100.623.0fINDIVIDUAL4.00.01726 Cooper Passage Suite 129\r\nNorth Deniseberg, DE 30723
618000.036 months5.32542.07AA1Software Development Engineer2 yearsMORTGAGE125000.0Source VerifiedSep-2015Fully Paidhome_improvementHome improvement1.36Aug-20058.00.04178.04.925.0fINDIVIDUAL3.00.01008 Erika Vista Suite 748\r\nEast Stephanie, TX 22690
713000.036 months11.14426.47BB2Office Depot10+ yearsRENT46000.0Not VerifiedSep-2012Fully Paidcredit_cardNo More Credit Cards26.87Sep-199411.00.013425.064.515.0fINDIVIDUAL0.00.0USCGC Nunez\r\nFPO AE 30723
818900.060 months10.99410.84BB3Application Architect10+ yearsRENT103000.0VerifiedOct-2014Fully Paiddebt_consolidationDebt consolidation12.52Jun-199413.00.018637.032.940.0wINDIVIDUAL3.00.0USCGC Tran\r\nFPO AP 22690
926300.036 months16.29928.40CC5Regado Biosciences3 yearsMORTGAGE115000.0VerifiedApr-2012Fully Paiddebt_consolidationDebt Consolidation23.69Dec-199713.00.022171.082.437.0fINDIVIDUAL1.00.03390 Luis Rue\r\nMauricestad, VA 00813

Last rows

loan_amnttermint_rateinstallmentgradesub_gradeemp_titleemp_lengthhome_ownershipannual_incverification_statusissue_dloan_statuspurposetitledtiearliest_cr_lineopen_accpub_recrevol_balrevol_utiltotal_accinitial_list_statusapplication_typemort_accpub_rec_bankruptciesaddress
39602010000.036 months9.76321.55BB3Retirement Counselor10+ yearsRENT40000.0Not VerifiedDec-2015Fully Paiddebt_consolidationDebt consolidation23.40Jan-19889.00.08819.057.318.0wINDIVIDUAL1.00.0914 Alexander Mountains Apt. 604\r\nEast Marco, VT 70466
3960213200.036 months5.4296.52AA1St Francis Medical Center10+ yearsRENT33000.0Not VerifiedFeb-2011Fully Paiddebt_consolidation2011 Insurance and Debt Consolidation21.45Nov-199618.00.03985.07.650.0fINDIVIDUALNaN0.0309 John Mission\r\nWest Marc, NY 00813
39602212000.036 months12.29400.24CC1Data Center Specialist II1 yearRENT52100.0Source VerifiedOct-2015Fully Paiddebt_consolidationDebt consolidation17.28Oct-20046.00.09580.066.118.0wINDIVIDUAL0.00.0532 Johnson Drive Apt. 185\r\nAndersonside, NY 70466
39602322000.036 months18.92805.55DD4Operations Manager10+ yearsMORTGAGE138000.0Not VerifiedApr-2014Fully Paiddebt_consolidationDebt consolidation24.43May-199818.00.022287.050.439.0fINDIVIDUAL4.00.00297 Flores Dale Suite 441\r\nTaylorland, MD 05113
3960246000.036 months13.11202.49BB4Michael's Arts & Crafts5 yearsRENT64000.0Not VerifiedMar-2013Fully Paiddebt_consolidationCredit buster10.81Nov-19917.00.011456.097.19.0wINDIVIDUAL0.00.0514 Cynthia Park Apt. 402\r\nWest Williamside, SC 05113
39602510000.060 months10.99217.38BB4licensed bankere2 yearsRENT40000.0Source VerifiedOct-2015Fully Paiddebt_consolidationDebt consolidation15.63Nov-20046.00.01990.034.323.0wINDIVIDUAL0.00.012951 Williams Crossing\r\nJohnnyville, DC 30723
39602621000.036 months12.29700.42CC1Agent5 yearsMORTGAGE110000.0Source VerifiedFeb-2015Fully Paiddebt_consolidationDebt consolidation21.45Feb-20066.00.043263.095.78.0fINDIVIDUAL1.00.00114 Fowler Field Suite 028\r\nRachelborough, LA 05113
3960275000.036 months9.99161.32BB1City Carrier10+ yearsRENT56500.0VerifiedOct-2013Fully Paiddebt_consolidationpay off credit cards17.56Mar-199715.00.032704.066.923.0fINDIVIDUAL0.00.0953 Matthew Points Suite 414\r\nReedfort, NY 70466
39602821000.060 months15.31503.02CC2Gracon Services, Inc10+ yearsMORTGAGE64000.0VerifiedAug-2012Fully Paiddebt_consolidationLoanforpayoff15.88Nov-19909.00.015704.053.820.0fINDIVIDUAL5.00.07843 Blake Freeway Apt. 229\r\nNew Michael, FL 29597
3960292000.036 months13.6167.98CC2Internal Revenue Service10+ yearsRENT42996.0VerifiedJun-2010Fully Paiddebt_consolidationToxic Debt Payoff8.32Sep-19983.00.04292.091.319.0fINDIVIDUALNaN0.0787 Michelle Causeway\r\nBriannaton, AR 48052